EDA with Time Series Data

Import Packages and Loaded Data

Inspect Data

Check Missing Values and Duplicates

Data come as tidy data, that is good.

Modeling

Based on my experience, the number of taxi passenger usually vary from hour to hour rahther than day to day, so I use a single day as the window size.

Simple Moving Average (SMA)

From the time series plot, we see our dataset repeats its pattern in a short-period way

Based on the histplot above, we notice this dataset is negatively skewed

Using the tolerance band, we see there are some anomalies

Exponential Smoothing Method (ES)

Under the ES method, data shape is close to a normal distribution

Using tolerance band defined by the ES method, we see fewer anomalies

Seasonal-Trend Decomposition (STD) Method

Trend is relatively statble at the first half part, but become a little varying at the end

It is hard to see any large trend if we look at all datapoints at once. So we only pull out 10 periods, it looks like the seasonality is steady

Notes: When we add trend and residuel together, we have the predicted y values

From this view, we can see clearly where are anomalies and how far are them from the mean

The Facebook Prophet Module

The trendline seems predicting well

Another prophet model uses 12 weeks and predict 3 weeks

With a short period and more change points, the prediction looks fine

Model Diagnostics

I am not picky since this dataset is not a good example for time series analysis, but hope I can generate better predictions next time.

Model Comparsion

Conclusion